Building a new layer

This notebook will guide you through implementing a custom layer in neon, as well as a custom activation function. You will learn

  • general interface for defining new layers
  • using the nervana backend functions

Preamble

The first step is to set up our compute backend, and initialize our dataset.


In [ ]:
import neon

# use a GPU backend
from neon.backends import gen_backend
be = gen_backend('gpu', batch_size=128)

# load data
mnist = MNIST(path='data/')
train_set = mnist.train_iter
test_set = mnist.valid_iter

Build your own layer

Instead of importing the neon supplied Affine Layer, we will instead build our own.

Note: Affine is actually a compound layer; it bundles a linear layer with a bias transform and an activation function. The Linear layer is what implements a fully connected layer.

First, lets build our own linear layer, called MyLinear, and then we will wrap that layer in a compound layer MyAffine.

There are several important components to a layer in neon:

  • configure: during model initialization, this layer will receive the previous layer's object and use it to set this model's in_shape and out_shape attributes.
  • allocate: after each layer's shape is configured, this layer's shape information will be used to allocate memory for the output activations from fprop.
  • fprop: forward propagation. Should return a tensor with shape equal to the layer's out_shape attribute.
  • bprop: backward propagation.

In the implementation below, fprop is implemented using element-wise operations. It will be very slow. Try replacing it with the neon backend implementation of compound_dot, such as in the bprop function.


In [ ]:
from neon.layers.layer import ParameterLayer, interpret_in_shape

# Subclass from ParameterLayer, which handles the allocation
# of memory buffers for the output activations, weights, and 
# bprop deltas.
class MyLinear(ParameterLayer):

    def __init__(self, nout, init, name=None):
        super(MyLinear, self).__init__(init, name, "Disabled")
        self.nout = nout


    def __str__(self):
        return "Linear Layer '%s': %d inputs, %d outputs" % (
               self.name, self.nin, self.nout)

    def configure(self, in_obj):
        super(MyLinear, self).configure(in_obj)
        
        # shape of the input is in (# input features, batch_size)
        (self.nin, self.nsteps) = interpret_in_shape(self.in_shape)
        
        # shape of the output is (# output units, batch_size)
        self.out_shape = (self.nout, self.nsteps)
        
        # if the shape of the weights have not been allocated,
        # we know that his layer's W is a tensor of shape (# outputs, # inputs).
        if self.weight_shape is None:
            self.weight_shape = (self.nout, self.nin)
      
        return self

    def fprop(self, inputs, inference=False, beta=0.0):
        self.inputs = inputs

        # here we compute y = W*X inefficiently using the backend functions
        for r in range(self.outputs.shape[0]):
            for c in range(self.outputs.shape[1]):
                self.outputs[r,c] = self.be.sum(self.W[r,:] * inputs[:,c].T)
                
        # TODO:
        # try substituting the for loops above with the backend `compound_dot` 
        # function to see the speed-up from using a custom gpu kernel!
        # self.be.compound_dot(A=self.W, B=inputs, C=self.outputs)
        
        return self.outputs

    def bprop(self, error, alpha=1.0, beta=0.0):
        
        # to save you headache, we use the backend compound_dot function here to compute
        # the back-propogated deltas = W^T*error.
        if self.deltas:
            self.be.compound_dot(A=self.W.T, B=error, C=self.deltas, alpha=alpha, beta=beta)
        self.be.compound_dot(A=error, B=self.inputs.T, C=self.dW)
        return self.deltas

Wrap the above layer in a container, which bundles an activation and batch normalization.

Putting together all of the pieces

The architecture here is the same as in the mnist_mlp.py example, instead here we use our own MyAffine layer and MySoftmax activation function.


In [ ]:
from neon.initializers import Gaussian
from neon.models import Model
from neon.layers.layer import Activation
from neon.transforms.activation import Rectlin, Softmax

init_norm = Gaussian(loc=0.0, scale=0.01)

# assemble all of the pieces
layers = []
layers.append(MyLinear(nout=100, init=init_norm, name="Linear100"))
layers.append(Activation(Rectlin()))

layers.append(MyLinear(nout=10, init=init_norm, name="Linear10"))
layers.append(Activation(Softmax()))

# initialize model object
mlp = Model(layers=layers)

Fit

Using Cross Entropy loss and Gradient Descent optimizer, train the model. This will be slow, because our fprop is inefficient. Replace the fprop function using the backend's compound_dot method!


In [ ]:
from neon.layers import GeneralizedCost
from neon.transforms import CrossEntropyMulti
from neon.optimizers import GradientDescentMomentum
from neon.callbacks.callbacks import Callbacks

cost = GeneralizedCost(costfunc=CrossEntropyMulti())
optimizer = GradientDescentMomentum(0.1, momentum_coef=0.9)
callbacks = Callbacks(mlp, eval_set=test_set)

mlp.fit(train_set, optimizer=optimizer, num_epochs=10, cost=cost,
        callbacks=callbacks)